Semantic segmentation of UAV aerial remote sensing images provides a more efficient and convenient surveying and mapping method for traditional surveying and mapping. In order to make the model lightweight and improve a certain accuracy, this research developed a new lightweight and efficient network for the extraction of ground features from UAV aerial remote sensing images, called LDMCNet. Meanwhile, this research develops a powerful lightweight backbone network for the proposed semantic segmentation model. It is called LDCNet, and it is hoped that it can become the backbone network of a new generation of lightweight semantic segmentation algorithms. The proposed model uses dual multi-scale context modules, namely the Atrous Space Pyramid Pooling module (ASPP) and the Object Context Representation module (OCR). In addition, this research constructs a private dataset for semantic segmentation of aerial remote sensing images from drones. This data set contains 2431 training sets, 945 validation sets, and 475 test sets. The proposed model performs well on this dataset, with only 1.4M parameters and 5.48G floating-point operations (FLOPs), achieving an average intersection-over-union ratio (mIoU) of 71.12%. 7.88% higher than the baseline model. In order to verify the effectiveness of the proposed model, training on the public datasets "LoveDA" and "CITY-OSM" also achieved excellent results, achieving mIoU of 65.27% and 74.39%, respectively.
translated by 谷歌翻译
基于深度神经网络的时间序列分类方法很容易在UCR数据集上过度拟合,这是由这些数据集的几次拍摄问题引起的。因此,为了减轻进一步提高准确性的过度拟合现象,我们首先提出标记为IncepionTime(LSTIME)的标记平滑,这与软标签的信息相比,与硬标签相比。接下来,提出了通过LSTIME手动调整软标签,提出了成立时间(KDTIME)的知识蒸馏,以便通过教师模型自动生成软标签。最后,为了纠正来自教师模型的错误预测的软标签,提出了具有成立时间(KDCTIME)的校准的知识蒸馏,在其中包含两个可选的校准策略,即通过重新排序(KDCR)通过翻译(KDCT)和KDC的可选校准策略(KDC)(KDCR )。实验结果表明,KDCTIME的准确性很有前景,而其推理时间比火箭速度快两个数量级,具有可接受的训练时间开销。
translated by 谷歌翻译
地球表面不断变化,识别变化在城市规划和可持续发展中发挥着重要作用。虽然多年来已经成功开发了变化检测技术,但这些技术仍然仅限于相关领域的专家和促进者。为了为每个用户提供灵活的进入更改信息并帮助他们更好地了解陆地覆盖的变化,我们介绍了一种新的任务:在多时间空中图像上更改基于检测的视觉问题应答(CDVQA)。特别地,可以查询多时间图像以根据两个输入图像之间的内容改变获得基于高电平的改变的信息。我们首先使用自动问题答案生成方法构建CDVQA数据集,包括多时间图像问题答案三联网。然后,在这项工作中设计了一个基线CDVQA框架,它包含四个部分:多时间特征编码,多时间融合,多模态融合和答案预测。此外,我们还将更改增强模块引入多时间特征编码,旨在结合更多的变更相关信息。最后,研究了CDVQA任务的性能研究不同骨干和多时间融合策略的影响。实验结果为开发更好的CDVQA模型提供了有用的见解,这对未来对此任务的研究很重要。我们将通过公开提供我们的数据集和代码。
translated by 谷歌翻译
多种子体形成以及障碍物避免是多助理系统领域最受研究的主题之一。虽然一些经典控制器等模型预测控制(MPC)和模糊控制实现了一定的成功措施,但大多数都需要在恶劣环境中无法访问的精确全局信息。另一方面,一些基于加强学习(RL)的方法采用了领导者 - 跟随器结构来组织不同的代理行为,这使得造成诸如机动性和鲁棒性的瓶颈之间的代理之间的合作。在本文中,我们提出了一种基于多功能钢筋学习(Marl)的分布式形成和障碍避免方法。我们系统中的代理只能利用本地和相关信息来分发决策和控制自己。在多代理系统中的代理将在任何断开连接的情况下快速重新组织到新的拓扑中。与基线(经典控制方法和其他基于RL的方法)相比,我们的方法实现了更好的形成误差,形成收敛速度和障碍物的成功率的成功率。通过使用Ackermann-tenting车辆的模拟和硬件实现来验证我们的方法的可行性。
translated by 谷歌翻译
用于图像分类的最可公开的数据集是单个标签,而图像在我们的日常生活中是固有的多标记。这种注释差距使得许多预先接受的单标准分类模型在实际情况下失败。该注释问题更加关注空中图像:从传感器收集的空中数据自然地覆盖具有多个标签的相对大的陆地面积,而被广泛可用的注释空中数据集(例如,UCM,AID)是单标记的。作为手动注释的多标签空中图像将是时间/劳动,我们提出了一种新的自我校正综合域适应(SCIDA)方法,用于自动多标签学习。 SCIDA是弱监督,即,自动学习多标签图像分类模型,从使用大量的公共可用的单一标签图像。为实现这一目标,我们提出了一种新颖的标签 - 明智的自我校正(LWC)模块,以更好地探索潜在的标签相关性。该模块还使无监督的域适配(UDA)从单个到多标签数据中可能。对于模型培训,所提出的型号仅使用单一标签信息,但不需要先验知识的多标记数据;它预测了多标签空中图像的标签。在我们的实验中,用单标签的MAI-AID-S和MAI-UCM-S数据集接受培训,所提出的模型直接在收集的多场景空中图像(MAI)数据集上进行测试。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译